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Abstract. We consider the set £(-R, C) of all m x n matrices having 0-1 entries and 
prescribed row sums R = (r\ , . . . , r m ) and column sums C = (ci, ... , c n ). We prove 
an asymptotic estimate for the cardinality |S(-R, C)\ via the solution to a convex 
optimization problem. We show that if T,(R, C) is sufficiently large, then a random 
matrix D £ S(_R, C) sampled from the uniform probability measure in ~S(R, C) with 
high probability is close to a particular matrix Z = Z(R, C) that maximizes the 
sum of entropies of entries among all matrices with row sums R, column sums C and 
entries between and 1. Similar results are obtained for 0-1 matrices with prescribed 
row and column sums and assigned zeros in some positions. 



1. Introduction and main results 



Matrices with 0-1 entries and prescribed row and column sums is a classical 
object which appears in many branches of pure and applied mathematics. In com- 
binatorics, such matrices encode hypergraphs with prescribed degrees of vertices 
and related structures, see, for example, [LW01]. In algebra, certain structural 
constants in the ring of symmetric functions and, consequently, in the representa- 
tion theory of the symmetric and general linear groups are expressed as numbers 
of 0-1 matrices with prescribed row and column sums, see Chapter 1 of [Ma95]. In 
statistics, 0-1 matrices with prescribed row and column sums are known as binary 
contingency tables, see [C+05]. 
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Typeset by Aj^S-T^i. 



Let R = (ri, . . . , r m ) be a positive integer m- vector and let C = (ci, . . . , c n ) be 
a positive integer n-vector such that 

m n 

J2 r i = J2°j = N and 

i=i j=i 

< Ti < n for i = 1, . . . , m and < Cj < m for j = 1, . . . , n. 

Let £(-R, C) be the set of all m x n matrices (binary contingency tables) D = (dij) 
such that 

= rj for i = 1, . . . , m, <iij = Cj for j = 1, . . . , n and 

e {0, 1}. 

In words: E(i?, C) is the set of 0-1 matrices with row sums R and column sums C. 
Vectors R and C are called margins of a matrix D e E(i£, C). 

Our first main result provides an estimate of the cardinality of E(i?, C). 

(1.1) Theorem. Let us define the function 



m \ I n 



and let 



F<& y) = n ^ r< n ^ + ^) 
\i=i j \j=i j \ tj 

for x= (x!,... ,x n ) and y = (ui, . . . , y n ) 



a(R,C)= inf F(x,y). 

X± j j^TTl ^>0 



T7ien /or i/ie number |E(i?, C)| of m x n zero-one matrices with row sums R and 
column sums C we have 



Let us estimate the ratio between the lower and the upper bounds for |E(i2, C)| 
using Stirling's formula 



sis' 3 = e~ s V2^s~ (1 + Ois' 1 )) . 

Since 



m 



Ki=l J \j = l 

2 



the " e s " contributions from Stirling's formula cancel each other out and we 
obtain 

a(R,C) > \X(R,C)\ > {mn)-^ m+ ^a{R,C) 

for some absolute constant 7 > 0. 

We note that in many interesting cases we have |£(il, C)\ = 2 Q< - mn \ see also 
Section 3.1, in which case the estimate of Theorem 1.1 captures the logarithmic 
order of \E(R,C)\. 

Let us substitute Xi = e s \ y = e li in F(x, y). Then lnF(x, y) = G(s, t), where 

m n 

G( S , t) = -J2 ™ - E c i*i + E ln i 1 + eSl+tj ) 

i=l j=l ij 

for s = (si, . . . ,s m ) and t = (£1, . . . , t n ) . 

One can observe that G(s, t) is a convex function on M m x R n , hence to compute the 
infimum of G(s, t) one can use any of the efficient convex optimization algorithms, 
see, for example, [NN94]. 

Suppose that margins R,C are such that the set £(-R, C) is not empty and let 
us consider £(-R, C) as a finite probability space with the uniform measure. Let 
us pick a random matrix D e S(-R, C). What is D likely to look like? This 
question is of some interest to statistics: a binary contingency table D = (dij) 
may represent certain statistical data (for example, dij may be equal to 1 or 
depending on whether or not Darwin finches of the z-th species can be found on the 
j-th Galapagos island, as in [C+05]). One can condition on the row and column 
sums and ask what is special about a particular table D e S(-R, C), considering all 
tables in H(R,C) as equiprobable, see [C+05]. To answer this question we need to 
know what a random table D e £(-R, C) looks like. 

We prove that with high probability D is close to a particular matrix Z with 
row sums R and column sums C and entries between and 1, which we call the 
maximum entropy matrix. 

(1.2) The maximum entropy matrix. For < x < 1 let us consider the entropy 
function 

H(x) = xln- + (l-x)ln . 

X 1 

As is known, if is a strictly concave function with H(0) = H(l) = 0. 

For an m x n matrix X = (sy) such that < Xij < 1 for all we define 

H{X) = Y,H{ Xij ). 

Assume that T,(R,C) is non-empty. Let us consider the polytope V(R,C) of ma- 
trices X = (x^) such that 

n m 

Xij = Ti for % = 1, . . . , m, x^ = Cj for j = 1, . . . , n and 
3=1 i=i 

< x^ < 1 for all 
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Since H(X) is strictly concave, it attains a unique maximum Z = Z(R, C) on 
V(R,C), which we call the maximum entropy matrix with margins (R,C). 

For example, if all r\ are equal, then by the symmetry argument we must have 
Z = (zij) where Z{j = Cj/m for all 

The following observation characterizes the maximum entropy matrix as the 
solution to the problem that is convex dual to the optimization problem of Theorem 
1.1. 

(1.3) Lemma. Suppose that the polytope V(R,C) has a non-empty interior, that 
is, contains a matrix Y = (y%j) such that < < 1 for all Then the infimum 
a(R,C) in Theorem 1.1 is attained at a particular point x* = (£i,... , £ m ) and 
y* = • • • > Vn)- For the maximum entropy matrix Z = (z^) we have 

(1.3.1) za = for all i,j 
and, moreover, 

(1.3.2) a(R,C) = e H{z) . 

Conversely, if the infimum a(R,C) in Theorem 1.1 is attained at a certain point 
x * = • • • >£m) an d y* = (^i)--- 7 Vn) then for the maximum entropy matrix 
Z = (z^) equations (1.3.1) and (1.3.2) hold. 

The condition that the polytope V(R, C) has a non-empty interior is equivalent 
to the requirement that for every choice of 1 < k < m and 1 < I < n there is 
a matrix D° e E(i2, C), D° = (d^), such that dP kl = and there is a matrix 
D 1 E E(i?, C), D 1 = (d}j), such that d\ t = 1. One can take Y to be the average 
of all matrices D e E(i?, C). In other words, we require the set E(i?, C) to be 
reasonably large. We also observe that if r^Cj < iV for all i,j (recall that is the 
total sum of the matrix entries) one can choose yij = riCj/N. 

We prove that with high probability a random matrix D e E(i2, C) is close to 
the maximum entropy matrix Z as far as sums over subsets of entries are concerned. 
For a subset 

S C j) : i = l,...,m, j = l,...,n| 
and an m x n matrix A = (a^), let us denote 

(*,j)es 

the sum of the entries of A indexed by S. 

In what follows, we are interested in the case of the density N/mn separated 
from 0. Without loss of generality, we assume that n > m. 
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(1.4) Theorem. Let us fix numbers k > and < 5 < 1. Then there exists a 
number q = q(n, 5) such that the following holds. 

Let (R,C) be margins such that n > m > q and the polytope V(R,C) has a 
non-empty interior, and let Z e V(R, C) be the maximum entropy matrix. Let 
Sc. '■ i = 1, • • • , m, j = 1, . . . ,n} be a subset such that as(Z) > 5mn 

and let 



Inn 




If e < 1 then 

Pr{l) eZ(R,C) : 

(l-e)as(Z) < a s (D) < (1 + e)os(Z)} > 1 - 2n~ Kn . 

Let us associate with a non- negative, non-zero m x n matrix A = (oy) a finite 
probability space on the ground set '■ i = 1, • • • , m, j = 1, . . . , n} with 

Pr {(z, j)} = a,ij/N, where iV > is the total sum of matrix entries. Theorem 1.4 
asserts that the probability space associated with the maximum entropy matrix Z 
reasonably well approximates the probability space associated with a random binary 
contingency table D e E(i2, C) as far as events S whose probability is separated 
from are concerned. 

The following interpretation of the maximum entropy matrix was suggested to 
the author by J.A. Hartigan, see [BH09]. 

(1.5) Theorem. Let Z = (zij) be the mxn maximum entropy matrix with margins 
(.R, C) and let us suppose that the polytope V(R, C) has a non-empty interior. Let 
X = (xij) be the random mxn matrix of independent Bernoulli random variables 
such that 

EX = Z. 

In other words, Pr {x^ = 1} = and Pr {x^ = 0} = 1 — z^j independently for 
all Then the probability mass function of X is constant on the set £(-R, C) of 
binary contingency tables with margins (R, C) , and, moreover, 

Pr {X = D} = e~ H ( z) for all D G £(-R, C). 

The distribution of the random matrix X in Theorem 1.5 can be characterized 
as the maximum entropy distribution in the class consisting of all probability dis- 
tributions on the set {0, i| mXn of matrices with 0-1 entries whose expectations lie 
in the affine subspace consisting of the matrices with row sums R and column sums 
C, see [BH09]. 
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2. Extensions and ramifications 



Our results hold in a somewhat greater generality. Let us fix an m x n non- 
negative matrix W = (wij), which we call the matrix of weights. Let us consider 
the following partition function 



\E(R,C;W)\= Yl II 



Wij. 



DET,(R,C) i,j 
D={dij) dij = l 

In particular, if Wij = 1 for all i,j then \E(R,C;W)\ = \E(R,C)\. If w y G {0,1} 
then the partition function counts binary contingency tables with zeros assigned to 
some positions: the value of |£(-R, C; W)\ is equal to the number ofmxn matrices 
D = (dij) such that the row sums of D are R, the column sums of D are C, dij G 
{0, 1} for all and, additionally, dij = if Wij = 0. In combinatorial terms, the 
set C; VF) can be interpreted as the set of all subgraphs with prescribed degrees 
of vertices of a given bipartite graph. Binary contingency tables with preassigned 
zeros are of interest in statistics, see [C+05]. 
We prove the following result. 

(2.1) Theorem. Let us define the function 




for x = (xi, . . . ,x n ) and y = (y u . . . , y n ) 

and let 

a(R,C;W)= inf F(x,y;W). 

>o 

si,- ,y n >o 

Then for the partition function |S(i2, C; W)| we have 
a(R,C;W) > \E(R,C;W)\ 



^(ft^-j (lb: I — 



As before, the function obtained as the result of the substitution Xi = e tl , yj = 
e Sj in lnF(x, y; W), 

m n 

G(s, t;W) = -J2 - + J2 ln i 1 + ^ eSl+tj ) 

for s = (si, . . . ,s m ) and t = {t u . . . , t n ) 
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is convex on W 71 x M n , hence computing a(R, C; W) is a convex optimization prob- 
lem. 

Let us assume now that Wij G {0,1} for all and let us consider the set 

C;W) of all m x n binary contingency tables D = (dij) with the additional 
constraint that dij = if = 0. Assuming that £(-R, C;W) is not empty, we 
consider this set as a finite probability space with the uniform measure. We call 
matrix W the pattern. We are interested in what a random table D e C; W) 
looks like. We define the maximum entropy matrix as before. 

(2.2) The maximum entropy matrix. Suppose that the set C; W) is non- 
empty. Let us consider the polytope V(R, C; W) of m x n matrices X = (xy) such 
that 

n m 

y^Xjj =ri for z = 1, . . . , m, ^ = cj for j = 1, . . . , n, 
i=i i=i 

< < 1 for all and a;^- = whenever tu^ = 0. 

Thus V(R, C; W) is a face of polytope V(R, C) of Section 1.2. 

Let H(X) be the entropy function of Section 1.2. Since H{X) is strictly concave, 
it attains a unique maximum Z = C; W) on polytope V{R, C; W), which we 
call the maximum entropy matrix with margins (R, C) and pattern W . 

(2.3) Lemma. Suppose that the polytope V(R, C; W) contains a matrix Y = {yij) 
such that < y^ < 1 whenever = 1, in which case we say that V(R, C; W) has 
a non-empy interior. Then the infimum a(R, C; W) in Theorem 2.1 is attained at 
a certain point x* = (£i, . . . , £ m ) and y* = (771 , . . . , rj n ). The maximum entropy 
matrix Z = (zij) satisfies 

(2.3.1) = — — for all i, j such that wa = 1 
v , i + £. v . *3 

Moreover, 

(2.3.2) a(R,C;W) = e Hi * z \ 

Conversely, if the infimum a(R, C; W) is attained at a point x* = (£1, . . . , £ m ) and 
y* = (vii ■ • ■ iVn)> then for the maximum entropy matrix Z = (z^) the equations 
(2.3.1) and (2.3.2) hold. 

For V(R,C;W) to have a non-empty interior is equivalent to the requirement 
that for every pair k, I such that Wki = 1 there is a matrix D° e T,(R,C;W), 
D° = (rfO ), such that d° kl = and there is a matrix D 1 e C; W), D 1 = (d 1 ^), 
such that d\ x = 1. In other words, we require the set £(-R, C; W) to be reasonably 
large. 

We prove an analogue of Theorem 1.4. We consider subsets 

Sc{(i,j): Wii = l}. 

As before, we denote by ( A) the sum of the entries of a matrix A indexed by the 
subset S. 
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(2.4) Theorem. Let us fix numbers k > and < 5 < 1. Then there exists a 
number q = q(n, 5) such that the following holds. 

Let (R, C) be margins such that n > m > q and the polytope V(R, C; W) has a 
non-empty interior, and let Z e V(R, C; W) be the maximum entropy matrix. Let 
S C '■ Wij = l} be a subset such that o~s(Z) > 5mn and let 




If e < 1 then 

Pr{l) eZ(R, C;W) : 

(l-e)a s (Z) < a s (D) < (1 + e)a s {Z)] > 1 - 2n~ Kn . 

The statement of the theorem is, of course, vacuous unless pattern W contains 
Q(mn) ones. 

There is an analogue of Theorem 1.5. 

(2.5) Theorem. Suppose that the polytope V(R, C; W) has a non-empty interior 
and let Z e V(R,C;W) be the maximum entropy matrix. Let X = (xij) be the 
random m x n matrix of independent Bernoulli random variables such that 

EX = Z, 

that is, Pr {xij = 1} = zij and Pr {xij = 0} = 1 — Zij independently for all 
Then the probability mass function of X is constant on the set £(-R, C; W) and, 
moreover, 

Pr {X = D} = e~ H{z) for all D e S(-R, C; W). 

3. Comparisons with the literature 

There is a vast literature on 0-1 matrices with prescribed row and column sums 
and with or without zeros in prescribed positions, see for example, Chapter 16 
of [LW01], [Ne69], [Be74], [GC77], recent [CM05], [G+06], [C+08], [GM09] and 
references therein. A simple and efficient criterion for the existence of a 0-1 matrix 
with prescribed row and column sums is given by the classical Gale-Ryser Theorem; 
in the case of enforced zeros, the question reduces to the existence of a network flow, 
see for example, Chapter 16 of [LW01]. Estimating the number of such matrices also 
attracted a lot of attention. Precise asymptotic formulas for the number of matrices 
were obtained in sparse cases for which Cn and Cj m [Ne69] , [Be74] , [G+06] , 
the regular case of all row sums ri equal and all column sums Cj equal [C+08] and 
cases close to regular [C+08], [GM09]. Formulas of Theorems 1.1 and 2.1 are not so 
precise but they are applicable to a wide class of margins (R, C) and they uncover 
some interesting features of the numbers |E(i2, C) \ and |£(-R, C; W)\. 

The following construction provides some insight into the combinatorial inter- 
pretation of the number a(R, C) from Theorem 1.1. 
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(3.1) Cloning the margins. Let us fix some margins R,C for which the set 
£(-R, C) is not empty, and, moreover, the polytope V(R,C) contains an interior 
point, so the conditions of Lemma 1.3 are satisfied. Let R = (ri,... , r m ) and 
C = (ci, . . . , c n ). For a positive integer /c, let us define the km- vector 



Rk 



kri, . . . , Axi, . . . , kr m , . . . , kr r 



V 

A: times 



and the /cn-vector 



V 

A: times 



Cfc — I ^ c i? • • • , kc±, . . . , kc n , . . . , /cc 



, . . . , 



times 



fc times 



In other words, we obtain margins (Rk,Ck) if we choose a matrix y e V(R,C) 
and then create a new block matrix by arranging k 2 copies of Y into a /cm x /cn 
matrix. Then R k is the vector of row sums of Y k and C k is the vector of column 
sums of Y k . Clearly, the conditions of Lemma 1.3 are satisfied for (R k , C k ). 
Theorem 1.1 then implies that 



(3.1.1) lim |£(14,C fc )| 1/fc2 =a(R,C). 

k >+oo 

Indeed, the infimum a(R, C) is attained at a certain point 

x* = (fi>--- ,Cm) and y* = (t/i,... ,t/„). 
It is not hard to see that the infimum a(R k ,C k ) is attained at 



fc times 



A: times 



and 



Yfc = I Vi,... ,7/1,... ,77n,--- ,7/n 
A: times A times 



(3.2) Asymptotic repulsion in the space of matrices. A natural candidate 
for an approximation of \T,(R,C) \ is the "independence estimate" 



(3.2.1) 



I{R, C) = 



mn 
N 



-1 m 



n n v 



see [GC77], [G+06], and [C+08]. 



9 



The intuitive meaning of (3.2.1) is as follows. Let us consider the set of all m x n 
matrices with 0-1 entries and with the total sum of entries equal to N as a finite 
probability space with the uniform measure. Let us consider the two events in this 
space: the event 1Z consisting of the matrices with row sums R and the event C 
consisting of the matrices with column sums C. One can see that 



1 m 

_ , . I mn\ T-i(n\ , _ ... (mn\ -r-r / m 



i=i 



and that 



|E(/?,C)| = (™)Pr(ftnC). 



Thus the value of (3.2.1) equals \T,(R, C)| if the events 7?. and C are independent. It 
turns out that (3.2.1) indeed approximates \T,(R,C) \ reasonably well in the sparse 
and near-unform cases, see [G+06] and [C+08]. 

However, for generic R and C, the independence estimate /(-R, C) overestimates 
|S(.R, C)\ by a 2 n ( mn ) factor. To see why, let us fix some margins R = (ri, . . . , r m ) 
and C = (ci, . . . , c n ) such that not all row sums ri are equal and not all column 
sums Cj are equal and the conditions of Lemma 1.3 are satisfied. Let us consider 
the cloned margins Rk and Ck as in Section 3.1. 

Applying Stirling's formula, we get 

lim I(R k ,C k ) 1/k2 

k >+oo 

(3.2.2) = ex p{--^(— J+-E^O 

n n 
j = l 

where if is the entropy function, see Section 1.2. To compare (3.2.2) and (3.1.1) 
we use Lemma 1.3 and the multivariate entropy function 

k 1 
H(pi,... ,p fc ) = J^pfcln— , 



where pi, . . . ,Pk are no n- negative numbers such that pi + . . . + pk = 1. Thus 
i7(x) = H(x, 1 — x) for < x < 1 and we rewrite (3.2.2) as 



(^,...,^) + ( mB -J V )H 
+iVH(^,...,^)+(mn-iV)H 



n — ri n — r r 



mn — N mn — N 
m — ci m — c T 



mn — iV mn — iV 
A^lnA^ - (mn - N) \n(mn - N). 
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On the other hand, applying Lemma 1.3, we can rewrite (3.1.1) as 
lim 

k >+oo K z 



lim —r In |S (R k , C k ) 



=iVH ( + (mn - N)H ( 1 \ ) - NlnN 

— (mn — N) ln(mn — N), 

where Z = (zij) is the maximum entropy matrix for margins (R, C). 

We now use some classical entropy inequalities, see, for example, [Kh57]. Namely, 
by the inequality relating the entropies of two partitions of a probability space and 
the entropy of their intersection, we have 

with the equality if and only if 

(3.2.3) Zi i = TJ N- fora11 iJ 

and 

1 — \ ^ TT ( n — r\ n — r m \ , TT ( m — c\ m 



H ^- <H „ +H 



mn — N J ~ \ mn — N ' ' mn — N J \ mn — N ' ' mn — N 

with the equality if and only if 

^ ,n . (n — ri)(m — Cj) . . . 

3.2.4 l~Zij = - — AT 3> for all 

mn — N 

However, if we have both (3.2.3) and (3.2.4), we must have (rim — N)(cjn — N) = 0, 
so unless all row sums are equal or all column sums Cj are equal, we have 

lim \E(R k ,C k )\ 1/k2 < lim I (R k , C k ) 1/k2 . 

k >+oo k >+oo 

Therefore, as k grows, the independence estimate (3.2.1) overestimates the number 
of 0-1 matrices with row sums R k and column sums C k by a factor of 2°( fc ). In 
probabilistic terms, as k grows, the event TZ k consisting of the 0-1 matrices with row 
sums R k and the event C k consisting of the 0-1 matrices with column sums C k repel 
each other (the events are negatively correlated) instead of being asymptotically 
independent. 

The procedure of cloning described in Section 3.1 produces margins of increasing 
size with the following features: the density remains separated from and 1, and if 
the margins were non-uniform initially, they stay away from uniform. One can show 
that for more general sequences of margins that share these two features, we have 
the asymptotic repulsion of the event consisting of the 0-1 matrices with prescribed 
row sums and the event consisting of the 0-1 matrices with prescribed column sums. 
This is in contrast to the case of contingency tables (non-negative integer matrices 
with prescribed row and column sums), where we have the asymptotic attraction 
of the events [Ba09] . 
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(3.3) Randomized counting and sampling. Jerrum, Sinclair, and Vigoda 
[J+04] showed how to apply their algorithm for computing the permanent of a 
non-negative matrix to construct a fully polynomial randomized approximation 
scheme (FPRAS) to compute |E(i2, C)\ and, more generally, C; W)\, where 
W is a 0-1 pattern, see also [B+07]. Furthermore, they obtained a polynomial time 
algorithm for sampling a random D G C) and D G E(i2, C; W) from a "nearly 
uniform" distribution. This problem arises naturally in statistics, see, for example, 
[C+05]. The estimates of Theorem 1.1 and Theorem 2.1 are not nearly as precise 
as those of [J+04] , but they are deterministic, easily computable, and amenable to 
analysis. Similarly, we do not provide a sampling algorithm but show instead in 
Theorems 1.4 and 2.4 what a random matrix is likely to look like. 

(3.4) An open question. Theorem 1.5 allows us to interpret Theorem 1.4 as a 
law of large numbers for binary contingency tables: with respect to sums <Js{D) 
for sufficiently "heavy" sets S of indices, a random binary contingency table D G 
£(-R, C) behaves approximately as the matrix of independent Bernoulli random 
variables whose expectation is the maximum entropy matrix Z = (zij). Similar 
concentration results can be obtained for other well-behaved functions on binary 
contingency tables. One can ask whether the distribution of a particular entry of 
a random table D G £(-R, C) converges in distribution to the Bernoulli distribution 
with expectation as the dimensions m and n of the table grow in some regular 
way, for example, when the margins are cloned as in Section 3.1. 

Our approach, based on estimating combinatorial quantities via solutions to 
optimization problems, reminds one of that of Gurvits [Gu08]. The appearance 
of entropy in combinatorial counting problems reminds one of recent papers of 
Cuckler and Kahn [CK09a], [CK09b], although methods and results seem to be 
quite different. 

In the rest of the paper, we prove the results stated in Sections 1 and 2. 

4. Preliminaries: permanents and scaling 

Let A = (dij) be an n x n matrix. The permanent of A is defined by the 
expression 

n 

per A = Yl a icr(i)i 

where S n is the symmetric group of all permutations of the set {1, . . . , n}. The 
relevance of permanents to us is that both values of \T,(R,C)\ and |E(i2, C; W)\ 
can be expressed as permanents of mn x mn matrices. This result is not new, for 
\T,(R,C)\ it was observed, for example, in [JS90]. For |E(i?, C; W)\, where W is a 
0-1 pattern, a construction is presented in [J+04]. We give a general construction 
for C; W)\, where W is an arbitrary matrix, which is slightly different from 

that of [J+04]. 
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(4.1) Lemma. Let us choose margins R = (r±, . . . ,r m ), C = (ci,... ,c n ) and 
an m x n matrix W = (wij) of weights. Let us construct an mn x mn matrix 
A = A(R, C; W) as follows. 

The rows of A are split into disjoint 

m blocks having n — ri, . . . ,n — r m rows respectively (blocks of type I) 
and 

n blocks having c\, . . . ,c n rows respectively (blocks of type II). 

The columns of A are split into m disjoint blocks of n columns in each. 

For i = 1, . . . , m the entry of A that lies in a row from the i-th block of rows of 
type I and in a column from the i-th block of columns is equal to 1. 

For i = 1, . . . , m and j = 1, . . . , n the entry of A that lies in a row from the j-th 
block of rows of type II and the j-th column from the i-th block of columns is equal 

to Wij. 



All other entries of A are Os. 
Then 



iw;mi = gt^)(n^ 



— per A. 



Proof. First, we express C; W)| as a coefficient in a certain polynomial. Let 

xi, . . . , x n be formal variables and let 

e r (xi, . . . , 3? n ) ^ ^ Xjj • • • Xi r 

l<il <...<i r <n 

be the elementary symmetric polynomial of degree r. Thus e r (x±, . . . , x n ) is 

n 

the coefficient of t n ~ r in the product J J(^ + ^j)- 
We observe that |E(i?, C; W)\ is 

n m 

the coefficient of j J x^ in the product j^J e ri (wnxi, . . . , Wi n x n ) . 

j=l i=l 

Summarizing, we conclude that C; W)\ is 

m n m n 

the coefficient of |3^r~ ri Y\. x 7 m P r °d uc t J \ J J (U + WijXj) . 

i=l j = l i=l j = l 

To express the last coefficient as the permanent of a matrix, we use a convenient 
scalar product in the space of polynomials, see, for example, [Ba96] and [Ba07]. 
Namely, for monomials 

x a = x" 1 • • • x"™ where a = (ai, . . . , a n ) and x = (xi, . . . , x n ) 
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we define 

(x a ,x 6 ) = 



a\\---a n \ if a = b = (ai, . . . , a n ) 
if a ^ b 



and then extend the scalar product (•, •) by bilinearity. Equivalently, the scalar 
product can be defined as follows: let us identify M n © lR n = C n via x + iy = z and 
let v n be the Gaussian measure on C n with the density 

7r - n e -\\ z W where ||z|| 2 = ||ir|| 2 + ||y|| 2 for z = x + iy. 

Then, for polynomials / and g we have 

(f,g) = / f{z)g(z) dv n , 

where g is the complex conjugate of g, see for example, Section 4 of [Ba07]. 
The convenient property of the scalar product is that if 

m n m n 

p(x) = n bikXk and = n s QkXk 
1=1 k=i i=i k=i 

are products of linear forms, then 

(P, q) = per D, 

where D = (dij) is the m x m matrix defined by 

n 

d ij = ^2 b ikCjk for all ij, 

k=i 

see Lemma 4.5 of [Ba07] or, for a more general identity, Theorem 3.8 of [Gu04]. 
Thus we may write 



x (n*r ri ]>?' nn(*«+vi) 



U ' (II.' ) 



□ 
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(4.2) Matrix scaling and the van der Waerden bound. Let B = (pij) be an 
n x n matrix. Matrix B is called doubly stochastic if 



y^bij = 1 for i = 1, . . . , m, bij = 1 for j = 1, . . . n and 

j=i i=i 

bij > for all for all 

The classical bound conjectured by van der Waerden and proved by Falikman and 
Egorychev, see Chapter 12 of [LW01] and also [Gu08] for exciting new developments, 
states that 

n! 

peri? > — 

n n 

if B is a doubly stochastic matrix. 

Linial, Samorodnitsky, and Wigderson [L+00] introduced the following very use- 
ful scaling method of approximating permanents of non-negative matrices. Given 
a non-negative n x n matrix A = (a^) one finds non-negative numbers Ai, . . . , A n 
and hi, . . . ,/i n and a doubly stochastic matrix B = (bij) such that 

dij = XiHjbij for all 

Then 

perA= ^fl A ^ ^ft^j perS 

and an estimate of per B (such as the van der Waerden estimate) implies an estimate 
of per A. If A is strictly positive, such doubly stochastic matrix B and scaling factors 
A^ \ij always exist. In our situation, matrix A constructed in Lemma 4.1 is only 
non-negative. We will not always be able to scale it to a doubly stochastic matrix 
B exactly, but we will scale it approximately. 

We restate a weaker form of Proposition 5.1 from [L+00] regarding almost doubly 
stochastic matrices. 

(4.3) Lemma. For any n there exists an eo = eo(n) > and a function 0(e), 
< e < eo, such that 

lim cj)(t) = 1 

£ >0 + 

and for any n x n non-negative matrix B = (bij) such that 

n 

^bij = l for j = l,...,n 

i=i 

and 

n 

1 — e < ^^bij < 1 + e for i = l,...,n 

3 = 1 
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for some < e < 6q, we have 



perS > —Me) 



□ 

From [L+00], one can choose e = 1/n and 4>(e) = (1 — en) n . 

5. Proofs of Theorems 1.1 and 2.1 

We prove Theorem 2.1 only since Theorem 1.1 is a particular case of Theorem 
2.1. We start with a straightforward observation. 

(5.1) Lemma. We have 

l[(l + w ij x i y j )= J2 \V(R,C;W)\x R y c , where 

ij (R,C) 

x fl = x[ 1 ■■■x r ™, y c = y{ 1 ■■■y c r Ti 
and the sum is taken over all margins R, C . 

□ 

Next, we need a technical lemma. 

(5.2) Lemma. Let W = (wij) be an m x n non-negative matrix such that 

a(R, C; W) > 0. 

Then, for any e > there exist points x = x(e) and y = y(e), x = (x\, . . . , x n ) and 
y = (yi, . . . ,y n ), such that 



- r *+£i 



WijXiyj 



^ 1 + WijXiyj 



< e for i = 1, 



— ^ 1 + WijXiyj 
Xi,yj > /or a// 



m 



< e /or j = 1, . . . , n and 



Proof. Let us consider the function 



G(s, t;W) = -J2 nsi - c i l i + £ ln i 1 + ''' ) 

for s = (si, . . . ,s m ) and t = (*i, . . . , t n ) . 
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Then G(s, t; W) is convex and 



inf G(s, t) = In a(R, C; W) > -oo. 

t€R" 



Hence G(s, t) is bounded from below, it is also easy to check that the Hessian of G 
remains bounded on M. rn x M. n . Therefore, the gradient of G(s, t) can get arbitrarily 
close to 0. That is, for any e > there are points 



s(e) = (si(e),... ,s m (e)) and t(e) = (*i(e), . . . ,t n (e)) 



such that 



G '( S ' t )|s=s( e ),t=t( e ) 



_d_ 

£c(.,t)| 



s=s(e),t=t(e) 



< e for % = 1, . . . , m and 

< e for j = 1, . . . , n 



(it suffices to choose s(e) and t(e) so that the value of G(s(e), t(e)) is sufficiently 
close to the infimum). In other words, 



and 



< e for i = 1, 



< e for j = 1, . . . , n. 



We now let 



Xi = Xi(e) = e Si ^ for i = 1, . . . , m and 



Vj 



= y j (e) = e t ^ e) for j = l,...,n. 



□ 



(5.3) Proof of Theorem 2.1. The upper bound 

a(i2,C; W) > |E(i?,C; W)| 

follows from Lemma 5.1. Let us prove the lower bound. 

If a(R,C;W) = then |E(i2,C; W)\ = and the lower bound follows. Hence 
we assume that a(R, C; W) > 0. 

Let A = A(R, C; W) be the mn x mn block matrix constructed in Lemma 4.1. 
Let us consider the mn x mn block matrix £?(e) obtained from A as follows. For 
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e > 0, let x(e) = (xi, . . . , x m ) and y(e) = (j/i, . . . , y n ) be the point constructed in 
Lemma 5.2. 

For i = 1, . . . , m we multiply every row of A in the i-th block of type I by 

1 



Xi(n - n) ' 

For j = 1, . . . , n we multiply every row of A in the j-th block of type II by 

— for j = l,...,n; 

For i = l,... , m and j = 1, . . . , n we multiply the j-th column in the i-th block 
of columns of A by 

X 7 ; 



1 + WijXiUj 

This choice of scaling factors is, basically, a lucky guess made in the hope to match 
the structure of the function F(x, y; W). 

Thus we have 



i 3 3 

Kl=l J \j = l 



X 11^ 1 ^ + w v x iVi) I P erS ( e ) 



and hence 



f = i {n-Ti)\ 



n cj 
C 3 



\j=i C i 



(5.3.1) 



xF(x(e),y(e);W) perS(e) 



> 



x W) perS(e). 

Finally, we claim that B(e) is close to a doubly stochastic matrix. Indeed, 

For i = 1, . . . , m and j = 1, . . . , n the entry of B(e) that lies in a row from the 
i-th block of rows of type I and in the j-th column from the z-th block of columns 
is equal to 

1 



n Cj 
C 3 



(n - ri)(l + WijXiUj) 
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For i = 1, . . . , m and j = 1, . . . , n the entry of B(e) that lies in a row from the 
j-th block of rows of type II and the j-th column from the i-th block of columns is 
equal to 

Cj(l + WijXiVj)' 

All other entries of B{e) are Os. 

Let us compute the row sums of B(e). 

For a row in the z-th block of rows of type I the sum equals 



(n-r^il + WijXiy,)' 
Since 

1 + 



p\ 1 + WijXiVj j-^ 1 + j-^ 1 + 

by the inequalities of Lemma 5.2, we have 

\cii — 11 < — - — < e for i = l,...,m. 

For a row in the j-th block of rows of type II the sum equals 

By the inequalities of Lemma 5.2, we have 

\bj — 1| < — < e for j = l,...,n. 



Let us compute the column sums of B(e). 

For the j-th column from the i-th block of columns the sum equals 
{n-n) t tt^— r + Cj - — - ' 



{n-r^^ + WijXiyj) cj{ 1 • ir u .rnij) 
Clearly, 5(e) is non-negative and hence by Lemma 4.3, we have 

[777,77, ] J 

perS(e) > -—6(e) where lim 6(e) = 1. 

w _ (mn)™ n ' e ^o+ rv ' 

The proof now follows by (5.3.1) as e — > 0+. □ 
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6. Proofs of Lemmas 1.3 and 2.3 



We prove Lemma 2.3 only since Lemma 1.3 is a particular case of Lemma 2.3. 

Proof of Lemma 2.3. Since H'(x) = ln(l — x) — In a;, the value of the derivative at 
x = is +oo (we consider the right derivative there), the value of the derivative at 
x = 1 is — oo (we consider the left derivative there) and the value of the derivative 
is finite for any < x < 1. Suppose that for the maximum entropy matrix Z we 
have Zij G {0, 1} for some i,j such that lOy = 1. If Y G V(R, C; W), Y = (yij), is 
a matrix such that < < 1 whenever Wij = 1 then 

H (eY + (1 - e)Z) > H(Z) for a sufficiently small e > 0, 

which contradicts to the choice of Z. Hence 



< < 1 whenever = 1. 

Therefore, the gradient of H(X) at X = Z is orthogonal to the affine subspace of 
matrices X = (xij) having row sums R, column sums C, and such that Xij = 
whenever = 0. Hence 

1 — z- ■ 

(6.1) In — = Xi + nj for all i,j such that = 1 

and some Ai, . . . , A m and /zi, . . . , \i n - Hence 

za = ; whenever wa = 1. 

l + e~ Ai e~^ 



Therefore, 



Y] 1 7 _t - u . = r i for * = !» • • • » m 
^-^ 1 + e Ai e ^ 



r = Cj for 7 = l,...,n. 

i: 

Therefore, 

s* = (-Ai, . . . , -A m ) and t* = (-//!, . . . , -fj, n ) 
is a critical point of 

m n 

G(s,t;W) = -J2ns l -J2 c J t J+ E ln(l + e Si +^). 

i=l j=l 
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Since G is convex, (s*,t*) is also a minimum point. Therefore, the point x* = 
(fi, . . . , Cm) and y* = (7/1, . . . , 7/ n ) where 

£i = e~ Xi for i = 1, . . . , m and 77^ = e~^ j for j = 1, . . . , n 

is a minimum point of 

F(x,y;W) = (f[x-A (flvp) U (1 + 



and satisfies 



(6.2) 



St 

Wij = l 

St 



\i=l 



J =1 



+ 



= Ti for % = 1, . . . , m 
= Cj for j = 1, . . . , n. 



Conversely, if x* = (£1,... >£m) an d y* = (771 , . - - ,r7 n ) is a point where the 
minimum of F(x, y; W) is attained, then, setting the gradient of InF to 0, we 
obtain equations (6.2). Letting 

^ = TTTV when <"- = 1 

and Zij = when Wij = 0, we obtain a matrix Z e C;W). Moreover, the 

gradient of H(X) at X = Z satisfies (6.1) with Aj = — ln£j and /ij = — lnr^-, so Z 
is the maximum entropy matrix. 
We now check: 



H ( Z ) = ~ /Z Zi i lnz v ~ zZ ^~ Zi o) ln ( X ~ z v) 



= - S T 



ln 



ft 7 ** 



i=i 



+ 6'/., 1 + tiVj 
( \ 

+ E In (l + ^j) 

(i,i)= 



i=i 



i=i 
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by (6.2). Hence 

H(Z) = \nF(x*,y*;W) 

and the proof follows. 

□ 

7. Proofs of Theorems 1.5 and 2.5 

We prove Theorem 2.5 only, since Theorem 1.5 is a particular case of Theorem 
2.5. 

From formula (6.1), we have 



1 -z 



— e ^i+H f or a v[ i^j that Wij = 1 



Zij 

and some Ai, . . . , A m and fMi, . . . , fx n - Then, for all i,j such that iOy = 1 and any 
dij G {0, 1}, we have 



Pr { Xij = d. u } =zfj J (1 - Zij y~ d » = (1 - Zij ) I ' 



Zij 



= (1 -zy)e~ (Ai+M ' )di '. 
Consequently, for any D e C; W), D = (dij), we have 



Pr{X = D}= J] (1- Zij ) 



e 



-(\i+Hj)di 
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Wij = l 



m 



II ' ) II- ) II 

^i,j: Wij = l J \i=l / \j=l 



On the other hand, 



n 


~ 2i J 


l,j: Wij = l 




(■■" 


(1 




= 1 


(■■" 


(1 




= 1 



n 

K t,j: Wij = l 



1 ~ Zjj ^ " 



m 



\i=l J \j=l 

which completes the proof. □ 
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8. Proofs of Theorems 1.4 and 2.4 

We prove Theorem 2.4 only since Theorem 1.4 is a particular case of Theorem 
2.4. 

We will use standard large deviation inequalities for bounded random variables, 
see, for example, Corollary 5.2 of [Mc89]. 

(8.1) Lemma. Let Yi, . . . , be independent random variables such that < Yi < 
1 for % = 1, . . . , k. LetY = Y X + ... + Y k and let a = E Y. Then, for < e < 1 we 
have 

Pr{Y > (l + e)a} < exp 
Pr{y < (l-e)a} < exp 




and 



□ 

(8.2) Proof of Theorem 2.4. Let X = (xij) be the mxn matrix of independent 
Bernoulli random variables such that EX = Z, as in Theorem 2.5. By Theorem 
2.5, the distribution of X conditioned on £(-R, C; W) is uniform and hence 

Pr {D GE(i?, C- W) : a s (D) < (1 - e)a s (Z)} 

_Pr{X : <7s(A") < (l-e)£7 S (Z) and leE^C;^)} 
~ Pr{X : X G E(i?,C;W)} ' 

Similarly, 

Pr {D GE( J R, C; W) : crs(D) > (1 + e)<r s (Z)} 

_Pr {X : a 5 (X) > (1 + e)<r s (Z) and X e E(i2,C; tP)} 
~ Pr {X : X G E(i?,C;Py)} ' 

By Theorem 2.5, Lemma 2.3 and Theorem 2.1, we get 

Pr{X G E(R,C;W)} = e~ H[z) \T,{R, C; W) \ 

(mn)\ ( T^-r (n — r i ) n ~'' 



> 



n 



(mn) mn (n — rj)! 

> (mn)" 7(m+n) 




for some absolute constant 7 > 0. 
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Therefore, 



Pr{D GE(#,C; W) : a(D) < (1 - e)a s (Z)} 

< (mn) 7(m+n) Pr {X : a s (X) < (1 - e)a s (Z)} 

(8.2.1) and similarly 

Pr{D G£(i?,C;W0 : a(D) > (1 + e)a s (Z) } 

< (mny {m+n ^v{X : a s (X) > (1 + e)a s (Z)}. 

By Lemma 8.1, 

Pr{X: a s (X) < (1 - e)a s {Z)} < exp |-I e V s (Z) | 

(8.2.2) and 

Pr{X: a s (X) > (l + e)a s (Z)} < exp |-I e V s (^)| . 



Jinn 

e = — and crs\Z) > omn 



Hence for 



we have 

(8.2.3) e 2 a s (Z) > S 3 n\n 2 n. 

Combining (8.2.1)-(8.2.3), we conclude that for any k > and all sufficiently large 
n > m > q(n, 5) we have 

Pr [D G E(i?, C; W) : a s (D) < (1 - e)a s (Z)} < n~ Kn and 
Pr {DeX(R,C;W): <r s (D) > (1 + e)a s (Z)} < 



as required. □ 
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